Miara stabilności w wyborze liczby grup w taksonomii zagregowanej z zastosowaniem analizy spektralnej i metody propagacji podobieństwa

Dorota Rozmus

doi:https://doi.org/10.59139/ws.2024.03.1

Dorota Rozmus Uniwersytet Ekonomiczny w Katowicach, Wydział Finansów, Polska / University of Economics in Katowice, Faculty of Finance, Poland ORCID: https://orcid.org/0000-0002-0565-5319 Wiadomości Statystyczne. The Polish Statistician, vol. 69, 2024, 3, s. 1-17 Opublikowano online: 2 kwietnia 2024 DOI https://doi.org/10.59139/ws.2024.03.1 Sposób cytowania: Rozmus, D. (2024). Miara stabilności w wyborze liczby grup w taksonomii zagregowanej z zastosowaniem analizy spektralnej i metody propagacji podobieństwa. Wiadomości Statystyczne. The Polish Statistician, 69(3), 1–17. https://doi.org/10.59139/ws.2024.03.1.

1204 Wyświetlenia 118 Pobrania

ARTYKUŁ

(Polski) PDF

STRESZCZENIE

Od lat 90. XX w. częstymi tematami rozważań w dziedzinie taksonomii są podejście zagregowane i stabilność metod grupowania. Dotychczas były one rozpatrywane osobno, ale w ostatnim czasie pojawiła się w literaturze propozycja połączenia tych dwóch pojęć – miara stabilności (ang. proportion of ambiguously clustered pairs – PAC), którą można zastosować w podejściu zagregowanym w taksonomii i która ma służyć jako kryterium wyboru optymalnej liczby grup. Celem badania omawianego w artykule jest porównanie wyników wyboru optymalnej liczby grup w taksonomii zagregowanej na przykładzie realizacji trzech Celów Zrównoważonego Rozwoju w krajach UE. Wykorzystano miarę PAC i wybrane klasyczne indeksy: Calińskiego-Harabasza, Dunna i Daviesa-Bouldina. Jako metody bazowe w podejściu zagregowanym zastosowano propagację podobieństwa (ang. affinity propagation method) i taksonomię spektralną (ang. spectral clustering). Badanie opierało się na danych z bazy Eurostatu za 2019 r. Uzyskane rezultaty świadczą o tym, że zarówno wybór kryterium ustalania liczby grup, jak i metody bazowej w taksonomii zagregowanej wpływają na ostateczne rozstrzygnięcie dotyczące ustalenia liczby grup. Bez względu na to, czy stosowano metodę propagacji podobieństwa czy taksonomię spektralną z klasycznymi indeksami, albo też metody te wykorzystywano jako bazowe w podejściu zagregowanym i wybierano liczbę grup za pomocą miary PAC, rozbieżności we wskazywanej liczbie grup okazywały się bardzo duże.

SŁOWA KLUCZOWE

podejście zagregowane w taksonomii, stabilność, taksonomia spektralna, metoda propagacji podobieństwa

JEL

C38

BIBLIOGRAFIA

Ben-Hur, A., Guyon, I. (2003). Detecting stable clusters using principal component analysis. W: M. J. Brownstein, A. B. Kohodursky (red.), Functional Genomics: Methods and Protocols (s. 159–182). Humana press. https://doi.org/10.1385/1-59259-364-X:159.

Bodenhofer, U., Kothmeier, A., Hochreiter, S. (2011). APCluster: an R package for affinity propagation clustering. Bioinformatics, 27(17), 2463–2464. https://doi.org/10.1093/bioinformatics/btr406.

Brock, G., Pihur, V., Datta, S., Datta, S. (2008). clValid: An R Package for Cluster Validation. Journal of Statistical Software, 25(4), 1–22. https://doi.org/10.18637/jss.v025.i04.

Chiu, D. S., Talhouk, A. (2018). diceR: an R package for class discovery using an ensemble driven approach. BMC Bioinformatics, 19(11), 1–4. https://doi.org/10.1186/s12859-017-1996-y.

Dudoit, S., Fridlyand, J. (2003). Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 19(9), 1090–1099. https://doi.org/10.1093/bioinformatics/btg038.

Fang, Y., Wang, J. (2012). Selection of the number of clusters via the bootstrap method. Computational Statistics and Data Analysis, 56(3), 468–477. https://doi.org/10.1016/j.csda.2011.09.003.

Fred, A. L. N., Jain, A. K. (2002). Data clustering using evidence accumulation. W: 2002 International Conference on Pattern Recognition (s. 276–280). IEEE. https://doi.org/10.1109/ICPR .2002.1047450.

Fred, A. L. N., Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 835–850. https://doi.org/10.1109/TPAMI.2005.113.

Frey, B. J., Dueck, D. (2007). Clustering by Passing Messages Between Data Points. Science, 315(5814), 972–976. https://doi.org/10.1126/science.1136800.

Henning, C. (2007). Cluster-wise assessment of cluster stability. Computational Statistics and Data Analysis, 52(1), 258–271. https://doi.org/10.1016/j.csda.2006.11.025.

Hornik, K. (2005). A CLUE for CLUster Ensembles. Journal of Statistical Software, 14(12), 1–25. https://doi.org/10.18637/jss.v014.i12.

Kannan, R., Vempala, S., Vetta, A. (2004). On clustering: Good, Bad and Spectral. Journal of the ACM, 51(3), 497–515. https://doi.org/10.1145/990308.990313.

Kuncheva, L. I., Vetrov, D. P. (2006). Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1798–1808. https://doi.org/10.1109/TPAMI.2006.226.

Leisch, F. (1999). Bagged Clustering (SFB Working Papers No. 51). https://doi.org/10.57938/9b129f95-b53b-44ce-a129-5b7a1168d832.

Leone, M., Sumedha, Weigt, M. (2007). Clustering by soft-constraint affinity propagation: applications to gene-expression data. Bioinformatics, 23(20), 2708–2715. https://doi.org/10.1093/bioinformatics/btm414.

Lord, E., Willems, M., Lapointe, F. J., Makarenkov, V. (2017). Using the stability of objects to determine the number of clusters in datasets. Information Sciences, 393, 29–46. https://doi.org/10.1016/j.ins.2017.02.010.

Marino, V., Presti, L. L. (2019). Stay in touch! New insights into end-user attitudes towards engagement platforms. Journal of Consumer Marketing, 36(6), 772–783. https://doi.org/10.1108/JCM-05-2018-2692.

Meng, J., Hao, H., Luan, Y. (2016). Classifier ensemble selection based on affinity propagation clustering. Journal of Biomedical Informatics, 60, 234–242. https://doi.org/10.1016/j.jbi.2016.02.010.

Monti, S., Tamayo, P., Mesirov, J., Golub, T. (2003). Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning, 52(1–2), 91–118. https://doi.org/10.1023/A:1023949509487.

Ng, A. Y., Jordan, M. I., Weiss, Y. (2001). On Spectral Clustering: Analysis and an algorithm. W: T. G. Dietterich, S. Becker, Z. Ghahramani (red.), Advances in Neural Information Processing Systems 14. The MIT Press.

Rozmus, D. (2011). Porównanie stabilności zagregowanych algorytmów taksonomicznych opartych na macierzy współwystąpień. Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu. Research Papers of Wrocław University of Economics, (176), 212–220.

Rozmus, D. (2013). Porównanie dokładności taksonomicznej metody propagacji podobieństwa oraz zagregowanych algorytmów taksonomicznych opartych na idei metody bagging. Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu. Research Papers of Wrocław University of Economics, (279), 106–114.

Rozmus, D. (2021). The Number of Groups in an Aggregated Approach in Taxonomy with the Use of Stability Measures and Classical Indices – A Comparative Analysis. Acta Universitatis Lodziensis. Folia Oeconomica, 6(357), 55–67. https://doi.org/10.18778/0208-6018.357.04.

Rozmus, D. (2022). Cluster Ensemble Stability in Clustering of EU Members in Terms of Sustainable Development Goals. W: K. Jajuga, G. Dehnel, M. Walesiak (red.), Modern Classification and Data Analysis. Methodology and Applications to Micro- and Macroeconomic Problems (s. 289–301). Springer. https://doi.org/10.1007/978-3-031-10190-8_20.

?enbabaoglu, Y., Michailidis, G., Li, J. Z. (2014). Critical limitations of consensus clustering in class discovery. Scientific Reports, 4, 1–13. https://doi.org/10.1038/srep06207.

Shamir, O., Tishby, N. (2008). Cluster stability for finite samples. W: J. C. Platt, D. Koller, Y. Singer, S. T. Roweis (red.), Advances in Neural Information Processing Systems 20 (NIPS 2007) (s. 1297–1304). Curran Associates.

Shi, J., Malik, J. (2000). Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905. https://doi.org/10.1109/34.868688.

Suzuki, R., Shimodaira, H. (2006). Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics, 22(12), 1540–1542. https://doi.org/10.1093/bioinformatics/btl117.

Volkovich, Z., Barzily, Z., Toledano-Kitai, D., Avros, R. (2010). The Hotteling’s metric as a cluster stability measure. Computer Modelling and New Technologies, 14(4), 65–72. http://www.cmnt.lv/upload-files/ns_3914_4_cmnt2010.pdf.

Yu, Z., Li, L., Liu, J., Zhang, J., Han, G. (2015). Adaptive noise immune cluster ensemble using affinity propagation. IEEE Transactions on Knowledge and Data Engineering, 27(12), 3176–3189. https://doi.org/10.1109/TKDE.2015.2453162.

Wróć do: